ECAGP Air Quality Analysis
1 Overview
This code gives an overview on: - Loading air quality data from QuantAQ instruments - Performing initial summary analysis - Loading meteorology data from public data - Combining air quality and meteorology data - Conducting exploratory analysis on dynamics of air pollutants related to time and meteorology
2 AIR QUALITY DATA LOADING
2.1 Loading initial packages
2.2 Load Air Quality Data file.
2.3 Summary statistics
## pm1 pm25 pm10
## Min. : 0.054 Min. : 0.221 Min. : 0.221
## 1st Qu.: 3.156 1st Qu.: 5.319 1st Qu.: 15.425
## Median : 5.010 Median : 7.705 Median : 26.910
## Mean : 6.156 Mean : 8.792 Mean : 41.722
## 3rd Qu.: 8.046 3rd Qu.: 11.129 3rd Qu.: 46.568
## Max. :270.678 Max. :351.924 Max. :8277.649
## NA's :35 NA's :35 NA's :35
## timestamp_local.x sn.x
## Min. :2024-07-01 20:26:12.00 Length:508006
## 1st Qu.:2024-08-17 15:36:42.75 Class :character
## Median :2024-10-01 21:50:38.00 Mode :character
## Mean :2024-10-07 11:58:00.75
## 3rd Qu.:2024-11-23 08:20:55.50
## Max. :2025-02-05 23:59:32.00
##
## timestamp.x met.xrh met.xtemp
## Min. :2024-07-01 20:26:12.00 Min. : 9.56 Min. :-3.14
## 1st Qu.:2024-08-17 19:36:42.75 1st Qu.:46.21 1st Qu.:22.84
## Median :2024-10-02 01:50:38.00 Median :63.25 Median :28.34
## Mean :2024-10-07 15:42:28.66 Mean :60.34 Mean :27.21
## 3rd Qu.:2024-11-23 13:20:55.50 3rd Qu.:75.11 3rd Qu.:32.22
## Max. :2025-02-06 04:59:32.00 Max. :99.04 Max. :46.41
##
## pm1num sn lat lon
## Min. : 0.000 Length:508006 Min. :29.73 Min. :-95.24
## 1st Qu.: 7.993 Class :character 1st Qu.:29.73 1st Qu.:-95.24
## Median : 13.094 Mode :character Median :29.73 Median :-95.24
## Mean : 16.852 Mean :29.73 Mean :-95.24
## 3rd Qu.: 21.116 3rd Qu.:29.73 3rd Qu.:-95.24
## Max. :256.040 Max. :29.73 Max. :-95.24
##
## sitename mod_date_1min original_met_time
## Length:508006 Min. :2024-07-01 20:26:00.00 Length:508006
## Class :character 1st Qu.:2024-08-17 15:37:00.00 Class :character
## Mode :character Median :2024-10-01 21:51:00.00 Mode :character
## Mean :2024-10-07 12:18:11.82
## 3rd Qu.:2024-11-23 08:21:00.00
## Max. :2025-02-06 00:00:00.00
##
## tmpc wd ws timestamp_local.y
## Min. :-6.67 Min. : 0 Min. : 0.000 Length:508006
## 1st Qu.:19.44 1st Qu.: 40 1st Qu.: 2.056 Class :character
## Median :25.56 Median :120 Median : 3.084 Mode :character
## Mean :23.81 Mean :129 Mean : 3.148
## 3rd Qu.:28.89 3rd Qu.:180 3rd Qu.: 4.112
## Max. :38.89 Max. :360 Max. :26.213
## NA's :13 NA's :13 NA's :13
## date name
## Min. :2024-07-01 20:26:12.00 Length:508006
## 1st Qu.:2024-08-17 15:36:42.75 Class :character
## Median :2024-10-01 21:50:38.00 Mode :character
## Mean :2024-10-07 12:18:05.05
## 3rd Qu.:2024-11-23 08:20:55.50
## Max. :2025-02-05 23:59:32.00
##
2.4 Date formatting
2.5 Time series
3 CLEANING STEPS
3.1 Define threshold values
3.2 Remove outliers
3.3 Sanity check time series - did you do your cleaning job?
3.4 Did you find any funky time periods that need to be removed from the data? If so, filter by Date range.
4 EXPLORATORY DATA ANALYSIS
We can answer a number of questions with air quality data. Some examples include: What is the air quality like now? Where is the air quality bad (now/typically)? When was AQ bad? What time of day should I (not) go outside? Where is my pollution coming from? How many bad pollution days were there this year? What fraction of the time was AQ good, bad, or in the middle? We’ll use the R package openair to explore answers to these questions with data.
4.0.1 Calendar Plots: When was air quality bad? How many bad days were there in the last year?
4.1 When PM1 was bad, what else was bad?
4.1.1 Explore some scatterplots
4.2 Diurnal Profiles: When is air quality (typically) bad? When is it typically (not) safe to go outside?
4.2.1 TrendLevel - when was air quality typically bad?
4.3 Directional analysis of pollutants: Where is pollution bad? And where is pollution coming from?
4.3.1 Create polar plots (and other things in that family)
## # A tibble: 4 × 5
## cluster mean_pm10 n n_percent pm10_percent
## <chr> <dbl> <int> <dbl> <dbl>
## 1 C1 49.3 240 0 0.1
## 2 C2 36.1 7837 1.5 1.3
## 3 C3 48.3 312447 61.5 71.2
## 4 C4 31.0 187434 36.9 27.4
## # A tibble: 4 × 5
## cluster mean_pm10 n n_percent pm10_percent
## <chr> <dbl> <int> <dbl> <dbl>
## 1 C1 49.3 240 0 0.1
## 2 C2 36.1 7837 1.5 1.3
## 3 C3 48.3 312447 61.5 71.2
## 4 C4 31.0 187434 36.9 27.4
4.3.2 Create Polar map plots
5 WQ
6 More General EDA
6.1 scatter plot - WIP, IDK yet
6.2 time series of all pm levels
- Wasn’t able to see the detail from PM1 and pm2.5
- EPA Limit(pm2.5: 35, pm10: 150) (Blue), Average level without 12/16-12/18 Data (Red)
- “This standard should not be exceeded more than once per year on average over three years”
6.3 Histogram of Daily Average PM10 Levels
- EPA uses a 24-hour standard of 150 µg/m³.
- This standard should not be exceeded more than once per year on average over three years.
- shown that it has exceeded twice (2024-07-31 - 145.164072) within a 7 month period
- MOD-PM-01395 38.37543, MOD-PM-01396 45.22899, Overall Average PM10: 41.80221
## # A tibble: 2 × 2
## sn overall_avg
## <chr> <dbl>
## 1 MOD-PM-01395 38.4
## 2 MOD-PM-01396 45.2
## Overall Average PM10: 41.80221
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Histogram of Daily Average PM2.5 Levels
- primary limit: 9.0 μg/m3, secondary limit: 15.0 μg/m (annual mean, averaged over 3 years)
- The average of 7 months period is 8.807473
- MOD-PM-01395 9.537952, MOD-PM-01396 8.076994, Overall Average PM2.5: 8.807473
## # A tibble: 2 × 2
## sn overall_avg
## <chr> <dbl>
## 1 MOD-PM-01395 9.54
## 2 MOD-PM-01396 8.08
## Overall Average PM2.5: 8.807473
7 12/16-12/18 Data
7.1 Time series of 12/16-12/18 with epa limits
- when looking at the calendar plot of pm2.5, pm10 12/16-12/18 stand out
- missing info on sen 1395from dec 16 - dec 17 9am
- 1396 is more towards the Southeast and generally has higher numbers
- 1396 blue and red line are overlapped
## Sensor MOD-PM-01395 rows: 914
## Sensor MOD-PM-01396 rows: 2875
7.2 Polar Plots of 12/16-12/18
- Both sensors are showing things from bottom right
- just a lot more on 1396
## Sensor MOD-PM-01395 rows: 914
## # A tibble: 4 × 5
## cluster mean_pm10 n n_percent pm10_percent
## <chr> <dbl> <int> <dbl> <dbl>
## 1 C1 71.4 220 24.1 19
## 2 C2 74.2 425 46.5 38.2
## 3 C3 86.5 53 5.8 5.5
## 4 C4 143. 216 23.6 37.3
## Sensor MOD-PM-01396 rows: 2875
## # A tibble: 4 × 5
## cluster mean_pm10 n n_percent pm10_percent
## <chr> <dbl> <int> <dbl> <dbl>
## 1 C1 125. 508 17.7 14.6
## 2 C2 62.7 238 8.3 3.4
## 3 C3 138. 1148 39.9 36.4
## 4 C4 202. 981 34.1 45.6
7.3 Annulus polar plot map 2345623464356
## Warning: There were 2 warnings in `dplyr::mutate()`.
## The first warning was:
## ℹ In argument: `data = purrr::map(data, prepare.grid)`.
## Caused by warning:
## ! There was 1 warning in `mutate()`.
## ℹ In argument: `data = purrr::map(data, prepare.grid)`.
## ℹ In group 1: `default = 15 December 2024 to 17 December 2024`.
## Caused by warning in `smooth.construct.cc.smooth.spec()`:
## ! basis dimension, k, increased to minimum possible
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
## Warning: There were 2 warnings in `dplyr::mutate()`.
## The first warning was:
## ℹ In argument: `data = purrr::map(data, prepare.grid)`.
## Caused by warning:
## ! There was 1 warning in `mutate()`.
## ℹ In argument: `data = purrr::map(data, prepare.grid)`.
## ℹ In group 1: `default = 15 December 2024 to 17 December 2024`.
## Caused by warning in `smooth.construct.cc.smooth.spec()`:
## ! basis dimension, k, increased to minimum possible
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
8 Missing entries
- When looking at time series, there are a substantial amount of gaps shown through straight lines
8.1 Histogram of Entries by hour
- Shows how the entries jump up ~8th hour and gradually decreases at night
- This is shown to be the result of solar panels not having battery
- QuantAQ potentially has a program to upgrade the batteries for free
## # A tibble: 24 × 2
## hour entries
## <int> <int>
## 1 11 23134
## 2 12 23113
## 3 13 23029
## 4 10 22959
## 5 14 22926
## 6 15 22809
## 7 9 22734
## 8 16 22483
## 9 19 22296
## 10 17 22281
## # ℹ 14 more rows
9 Where to look now?
9.1 Trendline without (12/16-12/18)
- For the month of Dec., it shows hours of 3-5 am lowering but still has a noticeable level of pm10
- Now it’s emphasizes 0-2 am during the month of february which is something we will try to look into
### Feb
- there are only 6 days so the data is not fully represent the month of Febuary
- there is a large spike on the 5th skewing the data
## Trendline without (12/16-12/18) and Feb.
- even after removing all the outlier data it still shows that dec has a high mean pm10 from 3-5
9.2 Polar Plots of PM10 by season
- Shows how it tends to be South East of both sensors
9.3 Histograpms of 10 min + 60 min moving averages
## # A tibble: 220 × 2
## date daily_peak
## <date> <dbl>
## 1 2024-07-02 217.
## 2 2024-07-03 183.
## 3 2024-07-04 452.
## 4 2024-07-05 573.
## 5 2024-07-06 217.
## 6 2024-07-07 69.5
## 7 2024-07-08 122.
## 8 2024-07-09 62.7
## 9 2024-07-10 54.3
## 10 2024-07-11 206.
## # ℹ 210 more rows
## # A tibble: 220 × 2
## date daily_peak
## <date> <dbl>
## 1 2024-07-02 133.
## 2 2024-07-03 132.
## 3 2024-07-04 171.
## 4 2024-07-05 303.
## 5 2024-07-06 161.
## 6 2024-07-07 53.8
## 7 2024-07-08 61.5
## 8 2024-07-09 53.7
## 9 2024-07-10 49.6
## 10 2024-07-11 80.9
## # ℹ 210 more rows
9.3.1 time series of the top 10 days from moving averages
## Warning in grabDL(warn, wrap, wrap.grobs, ...): one or more grobs overwritten
## (grab WILL not be faithful; try 'wrap.grobs = TRUE')
## Warning in grabDL(warn, wrap, wrap.grobs, ...): one or more grobs overwritten
## (grab WILL not be faithful; try 'wrap.grobs = TRUE')
## Warning in grabDL(warn, wrap, wrap.grobs, ...): one or more grobs overwritten
## (grab WILL not be faithful; try 'wrap.grobs = TRUE')
## Warning in grabDL(warn, wrap, wrap.grobs, ...): one or more grobs overwritten
## (grab WILL not be faithful; try 'wrap.grobs = TRUE')
## Warning in grabDL(warn, wrap, wrap.grobs, ...): one or more grobs overwritten
## (grab WILL not be faithful; try 'wrap.grobs = TRUE')
## Warning in grabDL(warn, wrap, wrap.grobs, ...): one or more grobs overwritten
## (grab WILL not be faithful; try 'wrap.grobs = TRUE')
## Warning in grabDL(warn, wrap, wrap.grobs, ...): one or more grobs overwritten
## (grab WILL not be faithful; try 'wrap.grobs = TRUE')
## Warning in grabDL(warn, wrap, wrap.grobs, ...): one or more grobs overwritten
## (grab WILL not be faithful; try 'wrap.grobs = TRUE')
## Warning in grabDL(warn, wrap, wrap.grobs, ...): one or more grobs overwritten
## (grab WILL not be faithful; try 'wrap.grobs = TRUE')
## Warning in grabDL(warn, wrap, wrap.grobs, ...): one or more grobs overwritten
## (grab WILL not be faithful; try 'wrap.grobs = TRUE')
10 hourly polar plots for pm 10
- K = 100 for seasonal plot
- K = 50 for hourly plot
- note that standard is k = 100
- when k = 100, sumemr data is gone, hours 00-07 in autumn are gone, and only 1396 13th hour in winter data